The Sea Level Monitor methodology is described in detail in Deltares (2023). Principles are:
previous_df <- readSeaLevelData(file.path("../../data/deltares/results/dutch-sea-level-monitor-export-stations-2023-11-20.csv")) %>%
filter(station %in% params$station)
Annual average sea level data for the Dutch main stations is downloaded from the Permanent Service for Mean Sea Level site and combined with the Global Tide and Surge Model (GTSM) annual average surge values. In case PSMSL data is not available for the most recent year (2024-1)
# Get data from PSMSL data service
rlr_df <- read_yearly_psmsl_csv(mainstations_df$psmsl_id, filepath = "../../")
In this analysis, measurements over the period 1890 to 2023 are considered.
if(params$monitoryear-1 == max(rlr_df$year)){
cat("Mean annual sea level downloaded from PSMSL are availabale up to ", max(rlr_df$year), ", the time series is up to date. ")
} else {
cat("Mean annual sea level downloaded from PSMSL are only available up to ", max(rlr_df$year), " and thus incomplete for the current analysis. In order to do a preliminary analysis, measurements from Rijkswatersataat Data Distribution Layer will be used for missing year(s). ")
}
Mean annual sea level downloaded from PSMSL are availabale up to 2023 , the time series is up to date.
# Check if PSMSL is up to date for analysis year
if(max(rlr_df$year) == params$monitoryear-1){
refreshed_df <- rlr_df
print("PSMSL time series is up-to-date and is used for analysis")
} else {
print("The PSMSL time series is not complete. An attempt is made to complete the data using RWS DDL.")
if(max(rlr_df$year) == params$monitoryear-2){ # robuuster maken. nu alleen check voor een ontbrekend jaar.
# Check if ddl data from required year exist
required_file <- file.path("../../data/rijkswaterstaat/ddl/annual_means/", paste0(params$monitoryear-1, ".csv"))
if(file.exists(required_file)){
ddl_datayear <- read_csv2(required_file)
} else{
cat("DDL data for year", params$monitoryear-1, "is not available", sep = " ")
}
}
}
FALSE [1] "PSMSL time series is up-to-date and is used for analysis"
# Get GTSM data from local file
gtsm <- read_yearly_gtsm(filename = "../../data/deltares/gtsm/gtsm_surge_annual_mean_main_stations.csv") |>
mutate(year = year(ymd(t)))
# convert rlr tot nap2005
refreshed_df <- rlr_df |>
mutate(
height = rlr_height_mm - as.numeric(`nap-rlr`),
) |>
rename(station = name)
# check if psmsl data need to be completed with ddl data
try(
if(
exists("ddl_datayear") &
!unique(ddl_datayear$year) %in% unique(refreshed_df$year)
){
refreshed_df <- refreshed_df |> bind_rows(ddl_datayear)
},
silent = T
)
refreshed_df <- refreshed_df |>
left_join(gtsm, by = c(station = "name", year = "year")) |>
mutate(
surge_anomaly = case_when(
year >= 1950 ~ (1000 * surge - mean(1000 * surge, na.rm = T)), # meters to millimeters
year < 1950 ~ 0
)
) |>
select(
year,
height,
station,
surge_anomaly
) %>%
bind_rows(
. |>
group_by(year) |>
summarise(
height = mean(height, na.rm = T),
surge_anomaly = mean(surge_anomaly, na.rm = T)
) |>
mutate(
station = "Netherlands"
)
) %>%
bind_rows(
. |>
filter(station %in% c("Vlissingen", "Hoek van Holland", "Den Helder", "Harlingen", "IJmuiden")) |>
group_by(year) |>
summarise(
height = mean(height, na.rm = T),
surge_anomaly = mean(surge_anomaly, na.rm = T)
) |>
mutate(
station = "Netherlands (without Delfzijl)"
)
) |>
addBreakPoints() %T>%
write_csv2("../../data/deltares/results/dutch-sea-level-monitor-export-stations-latest.csv") %>%
filter(year >= 1890)
# previous year did not include gtsm for year 1950.
# Therefore comparison of surge is done for years > 1950
# range(df$surge_anomaly)
# range(refreshed_df$surge_anomaly, na.rm = T)
refreshed_df_filter <- refreshed_df %>% filter(year > 1950 & year < 2023)
ggplot() +
geom_point(
data = previous_df,
aes(x = year, y = height),
shape = "+"
) +
geom_point(
data = refreshed_df_filter,
aes(x = year, y = height),
color = "blue",
shape = 21,
fill = "transparent",
size = 1
) +
facet_wrap(c("station"), ncol = 3)
ggplot() +
geom_point(
data = previous_df,
aes(x = year, y = surge_anomaly)
) +
geom_point(
data = refreshed_df_filter,
aes(x = year, y = surge_anomaly),
color = "blue",
shape = 21,
fill = "transparent",
size = 1
) +
facet_wrap(c("station"), ncol = 3)
This document analyses the sea level trend of the main stations in the Netherlands using different models. Based on geographical coverage and available time series length six stations are considered to be “main tide gauge stations”.
Additionally, to calculate the average sea level and sea level trend along the Dutch Coast, the main station sea level is averaged for each year (virtual station “Netherlands”). Because the station “Delfzijl” has a considerable gap in vertical adjustment for subsidence, we also consider a variant in which Delfzijl is omitted from the main analysis selection (“Netherlands (without Delfzijl)”).
map_stations(df = refreshed_df, mainstations_df = mainstations_df, mainstations_locs = mainstations_locs)
Hoofdgetijstations in Nederland. Er is aangegeven welke stations zijn meegenomen in dit rekendocument.
In this section we look at sea level measurements. The global collection of tide gauge records at the PSMSL was used to access the data. There are two types of datasets the “Revised Local Reference” and “Metric”. For the Netherlands the difference is that the “Revised Local Reference” undoes the corrections from the NAP correction in 2005, to get a consistent dataset. Here we transform the RLR back to NAP (without undoing the correction).
The rlrnap computes the rlr back to latest NAP (ignoring the undoing of the NAP correction) the alpha paramater is the dominant wind direction for the stations, based on de Ronde 2013. Id’s are the station ids in the PSMSL dataset. They may change from year to year as the PSMSL 0 point is arbitary. You can lookup the relevant parameters in the schematic diagram like this LRL diagram for station Vlissingen
knitr::kable(mainstations_df[,c('name', 'psmsl_id', 'msl-rlr', 'msl-nap', 'nap-rlr')], caption = "PSMSL inforamtion on the six mains tide gauge stations in the Netherlands.")
| name | psmsl_id | msl-rlr | msl-nap | nap-rlr |
|---|---|---|---|---|
| Vlissingen | 20 | 6976 | 46 | 6930 |
| Hoek van Holland | 22 | 6987 | 114 | 6873 |
| Den Helder | 23 | 6962 | 16 | 6946 |
| Delfzijl | 24 | 6953 | 130 | 6823 |
| Harlingen | 25 | 7024 | 110 | 6914 |
| IJmuiden | 32 | 7014 | 64 | 6950 |
Sea level measurements for the six main stations (yearly average) are shown in figure @ref(fig:zeespiegelmetingen).
p <- refreshed_df %>%
dplyr::filter(!grepl("Netherlands", station)) %>%
ggplot(aes(year, height)) +
geom_point(alpha = 1, aes(color = station), shape = 21, fill = "white", size = 1) +
geom_line(alpha = 0.5, aes(color = station), linewidth = 0.5) +
xlab("jaar") + ylab("gemeten zeespiegel in mm") +
theme_light() +
theme(legend.position = "bottom")
ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))
Jaarlijks gemiddelde zeespiegel voor de zes hoofdstations langs de Nederlandse kust.
# p
p <- refreshed_df %>%
dplyr::filter(grepl("Netherlands", station)) %>%
ggplot(aes(year, height)) +
geom_point(alpha = 1, aes(color = station), shape = 21, fill = "white", size = 1) +
geom_line(alpha = 0.5, aes(color = station), linewidth = 0.75) +
xlab("jaar") + ylab("gemeten zeespiegel in mm") +
theme_light() +
theme(legend.position = "bottom")
ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))
Jaarlijks gemiddelde zeespiegel voor gemiddelde van stations langs de Nederlandse kust.
# p
refreshed_df %>%
filter(year >= 1900) %>%
dplyr::filter(station == "Netherlands (without Delfzijl)") %>%
dplyr::arrange(-height) %>%
dplyr::select(year, station, height_mm = height) %>%
# dplyr::group_by(station) %>%
dplyr::slice(c(1:5)) %>%
knitr::kable(caption = "Overview of the five highest yearly average water levels for the combined station Netherlands (without Delfzijl) in mm during the considered period. ")
| year | station | height_mm |
|---|---|---|
| 2023 | Netherlands (without Delfzijl) | 152.8 |
| 2020 | Netherlands (without Delfzijl) | 96.6 |
| 2022 | Netherlands (without Delfzijl) | 94.6 |
| 2017 | Netherlands (without Delfzijl) | 94.2 |
| 2019 | Netherlands (without Delfzijl) | 92.0 |
The expected storm surge per year is determined using output of the Global Tide and Surge Model (GTSM) (ref). This calculates the surge given the bathymetry and climatic conditions. Model results are available from 1950 onwards (figure @ref(fig:gtsm-surge)). The model is run each successive year to calculate the annual average wind and air pressure for use in the Sea Level Monitor. The calculated variation in surge (surge anomaly) is subtracted from the measured sea level before the sea level rise is calculated. For the years before 1950, no runs are available and sea level is corrected for average surge only. This correction reduces the variation due to differences in surge per year and allows a more precise estimate of the long-term trend to be made.
p <- refreshed_df %>%
# filter(!grepl("Netherlands", station)) %>%
# filter(station == "IJmuiden") %>%
ggplot(aes(year, surge_anomaly)) +
geom_point(alpha = 1, aes(color = station), shape = 21, fill = "white", size = 1) +
geom_line(alpha = 0.5, aes(color = station), linewidth = 0.75) +
xlab("jaar") + ylab("windopzet in mm") +
theme_light() +
theme(legend.position = "bottom") +
coord_cartesian(xlim = c(1945, params$monitoryear))
# ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))
p
Modelled surge anomaly (deviation of storm surge from long year avearge). The yearly averages surge is calculated for 1950 - now. For earlier years, an average surge is assumed.
The sea level trend is calculated using generalized linear regression. the following models are tested (Deltares 2018 and Deltares 2023):
The trend is calculated for all stations individually, for the mean of all stations (“Netherlands”), and for means of all stations minus station Delfzijl (“Netherlands without Delfzijl”).
In the regression equation, nodal tide is one of the components and is estimated as a sinusoid curve with a period of 18.6 years.
byStation <- refreshed_df %>%
dplyr::group_by(station) %>%
tidyr::nest() %>%
dplyr::ungroup()
selectedmodel <- params$modeltype
models <- byStation %>%
expand_grid(modeltype = selectedmodel) %>%
mutate(modelfunctionname = paste(modeltype, "model", sep = "_")) %>%
# add functions for model calculation
mutate(modelfunctions = map(modelfunctionname, get)) %>%
# add models based on data and functions
mutate(model = pmap(
list(
data,
modelfunctions
),
\(.d, .f) .f(.d)
)) %>%
mutate(
glance = map(model, broom::glance),
rsq = glance %>% map_dbl("r.squared"),
adj.rsq = glance %>% map_dbl("adj.r.squared"),
AIC = glance %>% map_dbl("AIC"),
tidy = map(model, broom::tidy),
augment = map(model, broom::augment),
equation = map(model, function(x) equatiomatic::extract_eq(x))
)
eq <- models %>%
distinct(modeltype, equation) %>%
mutate(equation = paste0("$", equation, "$"))
knitr::kable(eq, escape = F)
| modeltype | equation |
|---|---|
| linear | \(\operatorname{height} = \alpha + \beta_{1}(\operatorname{year\ -\ epoch}) + \beta_{2}(\operatorname{cos(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \beta_{3}(\operatorname{sin(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \epsilon\) |
| broken_linear | \(\operatorname{height} = \alpha + \beta_{1}(\operatorname{year\ -\ epoch}) + \beta_{2}(\operatorname{from1993}) + \beta_{3}(\operatorname{cos(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \beta_{4}(\operatorname{sin(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \epsilon\) |
| broken_squared | \(\operatorname{height} = \alpha + \beta_{1}(\operatorname{year\ -\ epoch}) + \beta_{2}(\operatorname{from1960\_square}) + \beta_{3}(\operatorname{cos(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \beta_{4}(\operatorname{sin(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \epsilon\) |
Autocorrelation with previous year(s) can sometimes explain part of the otherwise unexplained variance. Especially when a trend is detected in the data, autocorrelation with relatively short lags (1 or few years) often occurs. In case of autocorrelation, we recalculate standard errors of the estimated parameters accordingly.
models %>%
mutate(
ACF = map(augment, function(x) fortify(acf(x$.resid, plot = F)))
) %>%
unnest(ACF) %>%
mutate(ACF_pass = (ACF >= lower & ACF <= upper)) %>%
filter(Lag >= 1) %>%
ggplot(aes(Lag, ACF)) +
geom_col(width = 0.4, aes(fill = ACF_pass)) +
geom_vline(xintercept = 8.9, linetype = 3) +
geom_vline(xintercept = 18.6, linetype = 3) +
geom_line(aes(y = lower), linetype = "dotdash", linewidth = 0.5) +
geom_line(aes(y = upper), linetype = "dotdash", linewidth = 0.5) +
scale_x_continuous(
breaks = scales::breaks_pretty(10)
)+
facet_grid(station ~ modeltype) +
theme_minimal() +
theme(
strip.text.y = element_text(angle = 0),
legend.position = "bottom"
)
Autocorrelation plot for selected stations and models.
There appears to be a consistent autocorrelation for all stations and models with a ‘lag’ of one year. The autocorrelation does not influence the value of the calculated trend parameters, but needs to be taken into account when calculating standard errors. The Newey West autocorrelatie term is used to correctly calculate the standard errors.
At station ‘Vlissingen’ there is autocorrelation with a ‘lag’ of 10 years. This could indicate an effect of the 8.8 year ‘lunar perigee cycle’. Because this only occurs at Vlissingen, this tidal component is not further accounted for in the analysis.
There is no apparent autocorrelation with a ‘lag’ of 18.6 years, the ‘nodal tide’ cycle. This is because the nodal tide is already incorporated in the three models.
# Using NeweyWest():
require(sandwich)
models <- models %>%
mutate(
tidy.HAC = map(
model,
function(x) broom::tidy(
sqrt(
diag(
NeweyWest(
x,
lag = 1,
prewhite = F,
adjust = T
)
)
)
)
)
)
models$tidy.HAC <- lapply(models$tidy.HAC,
function(x) {
x %>%
rename(
term.HAC = names,
st.err.HAC = x
)
}
)
The distribution of residuals resembles a normal distribution for most stations and model variants. Station Harlingen is an example where the distribution is out of centre when using the linear model. The width of the residuals distribution is narrower when measurements are corrected for surge prior to application of the models.
models %>%
unnest(c(data, augment), names_sep = "_") %>%
mutate(
surge_correction = case_when(
params$wind_or_surge_type == "GTSM" ~ ifelse(data_year >= 1950, params$wind_or_surge_type, "none")
)
) %>%
ggplot(aes(x = augment_.resid)) +
geom_density(
aes(
fill = surge_correction,
color = surge_correction),
# position = position_identity(),
alpha = 0.5, size = 1
) +
facet_grid(station ~ modeltype) +
geom_vline(xintercept = 0, alpha = 0.5) +
theme(
strip.text.y = element_text(angle = 0),
legend.position = "bottom"
)
Distribution of residuals should not reveal a clear deviation from a horizontal line.
models %>%
# filter(!grepl("Netherlands", station)) %>%
# filter(station == "IJmuiden") %>%
unnest(c(data, augment), names_sep = "_") %>% #str(max.level = 2)
ggplot(aes(data_year, augment_.resid)) +
geom_point(alpha = 0.4) +
facet_grid(station ~ modeltype) #+
lookup <- c(
Constant = "(Intercept)",
Trend = "I(year - epoch)",
u_nodal = "I(cos(2 * pi * (year - epoch)/(18.613)))",
v_nodal = "I(sin(2 * pi * (year - epoch)/(18.613)))",
`+ trend 1993` = "from1993",
`+ square_trend 1960` = "from1960_square",
AR_term = "previousYearHeight"
)
# data wrangling. move to functions.R
all_predictions <- models %>%
mutate(
preds = map2(data, model, add_predictions)
) %>%
dplyr::select(
station,
modeltype,
data,
tidy,
preds) %>%
tidyr::unnest(c(data, preds), names_sep = "_") %>%
tidyr::unnest(tidy) %>%
# str(max.level = 2)
dplyr::select(-std.error, -statistic, -p.value) %>% # clean up
tidyr::pivot_wider(
names_from = term,
values_from = estimate
) %>%
mutate(`data_height-surge_anomaly` = data_height - `preds_surge_anomaly`) %>%
mutate(`preds_height-surge_anomaly` = preds_pred - `preds_surge_anomaly`) %>%
rename(any_of(lookup)) %>%
# str(max.level = 2)
mutate(
nodal_tide =
u_nodal * cos(2*pi*(data_year-epoch)/18.613) +
v_nodal * sin(2*pi*(data_year-epoch)/18.613),
prediction_recalc = case_when(
if("linear" %in% params$modeltype){
modeltype == "linear" ~
Constant +
Trend * (data_year - epoch)# +
# AR_term * data_previousYearHeight
},
if("broken_linear" %in% params$modeltype){
modeltype == "broken_linear" ~
Constant +
Trend * (data_year - epoch) +
# AR_term * data_previousYearHeight +
(data_year >= 1993) * `+ trend 1993` * (data_year - 1993)
},
# if("broken_quadratic" %in% params$modeltype){
# modeltype == "broken_quadratic" ~ Constant + Trend * (data_year - epoch) +
# ifelse(data_year >= 1960, from1960_square * (data_year - 1960) * (data_year - 1960), 0)
# }
)
) %>%
select(
station,
modeltype,
data_year,
data_height,
preds_year,
prediction_recalc,
`data_height-surge_anomaly`,
`preds_height-surge_anomaly`,
nodal_tide
)
ggplot(
all_predictions,
aes(x = data_year)
) +
geom_point(aes(y = data_height)) +
geom_line(aes(y = prediction_recalc)) +
facet_grid(station ~ modeltype)
p <- plot_station(
predictions_all = all_predictions,
stationi = unique(all_predictions$station),
correctionVariant = "GTSM",
modelVariant = unique(all_predictions$modeltype),
printNumbers = F,
startyear = 1900
) +
facet_grid(station ~ modeltype) +
theme(legend.direction = "vertical",
legend.box = "horizontal",
legend.position = c(0.975, 0.025),
legend.justification = c(1, 0),
legend.title = element_blank())
# ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))
p
## gebruik DT in plaats van kableextra
lookup.df <- data.frame(long_term = unname(lookup),
short_term = names(lookup))
library(DT)
parametertable <- models %>%
select(station, modeltype, tidy) %>%
unnest(tidy) %>%
left_join(models %>%
select(station, modeltype, tidy.HAC) %>%
unnest(tidy.HAC),
by = c(
station = "station",
modeltype = "modeltype",
term = "term.HAC"
)
) %>%
mutate(across(where(is.numeric), round, 3)) %>%
left_join(lookup.df, by = c(term = "long_term")) %>%
select(-term, term = short_term) %>%
relocate(term, .after = modeltype)
parametertable %>%
DT::datatable(
options = list(
"digits" = 3
)
)
# kableExtra::kable(
# caption = "Coefficients for all models and stations.",digits = 2
# ) %>%
# kableExtra::scroll_box(height = "500px")
acc_broken_linear <- parametertable %>%
filter(modeltype == "broken_linear") %>%
filter(term == "+ trend 1993") %>%
select(station, p.value )
knitr::kable(acc_broken_linear, caption = "Significance for the acceleration term in the broken linear model for all stations. ")
| station | p.value |
|---|---|
| Vlissingen | 0.132 |
| Hoek van Holland | 0.038 |
| Den Helder | 0.000 |
| Delfzijl | 0.000 |
| Harlingen | 0.000 |
| IJmuiden | 0.615 |
| Netherlands | 0.000 |
| Netherlands (without Delfzijl) | 0.000 |
For the broken linear model, there is a significant acceleration starting in the year 1993 when fitting the average sea level combined for all stations without Delfzijl. For individual stations, the acceleration is not significant for the stations Vlissingen, Hoek van Holland and IJmuiden.
acc_broken_linear <- parametertable %>%
filter(modeltype == "broken_squared") %>%
filter(term == "+ square_trend 1960") %>%
select(station, p.value )
knitr::kable(acc_broken_linear, caption = "Significance for the acceleration term in the broken squared model for all stations. ")
| station | p.value |
|---|---|
| Vlissingen | 0.675 |
| Hoek van Holland | 0.012 |
| Den Helder | 0.000 |
| Delfzijl | 0.000 |
| Harlingen | 0.000 |
| IJmuiden | 0.811 |
| Netherlands | 0.000 |
| Netherlands (without Delfzijl) | 0.001 |
For the broken squared model, there is a significant acceleration starting in the year 1960 when fitting the average sea level combined for all stations without Delfzijl. For individual stations, the acceleration is not significant for the stations Vlissingen, Hoek van Holland and IJmuiden.
Of the two models with an acceleration term, the model with lowest AIC is the preferred model.
models %>%
# filter(station == "Netherlands (without Delfzijl)") |>
mutate(station = as.character(station)) %>%
select(station, modeltype, AIC) %>%
# unite(`modeltype x station`, modeltype, station) %>%
arrange(-AIC) %>%
mutate(modeltype = factor(modeltype, levels=config$runparameters$modeltype)) %>%
mutate(station = factor(station, levels=config$runparameters$station)) %>%
ggplot(aes(x = modeltype, y = AIC)) +
geom_point(size = 3, shape = "|") +
coord_flip() +
facet_wrap("station")
For the combined stations Netherlands and Netherlands (without Delfzijl), the non linear model has the lowest AIC, and is therefore the first candidate for the preferred model. In the next section, it is tested whether the non-linear model explains the observed variation significantly better than the simplest model, the linear model.
For stations Den Helder and Hoek van Holland, the broken squared model is the model with lowest AIC.
At station IJmuiden, all three models have similar AIC.
Considering all stations, the broken linear and the broken squared model describe the observed variation approximately equally well.
The broken linear model is chosen as the preferred candidate because it gave a better explanation of the observation, corrected for the degrees of freedom of the model (AIC criterion). Here, it is tested whether the broken linear model is significantly better model that the most simple model, the broken linear model.
bl <- models %>%
filter(
station == "Netherlands (without Delfzijl)",
modeltype == "broken_linear"
) %>%
select(model) %>%
unlist(recursive = F) %>%
unname()
l <- models %>%
filter(
station == "Netherlands (without Delfzijl)",
modeltype == "linear"
) %>%
select(model) %>%
unlist(recursive = F) %>%
unname()
t <- anova(l[[1]], bl[[1]])
# broom::tidy(t) %>%
# select(term, rss, p.value)
p_value <- t$`Pr(>F)`[2]
if(p_value<0.01) {
alternativemodelaccepted = TRUE
alternativemodelrefused = FALSE
}
The Anova table for model comparison is shown below.
# stargazer::stargazer(
# t,
# type = 'html',
# column.sep.width = '20pt'
# )
t
The acceleration model (broken linear) has one more degree of freedom as the linear model. The broken linear model is significantly better than the linear model (p < 0.001).
Based on the above analysis, the following conclusions are drawn:
Based on variance analysis the broken linear model is significantly better than the linear model. the broken linear model is accepted as the preferred model.